AITopics

Technology: Information Technology > Artificial Intelligence > Vision (1.00)

Neural Information Processing SystemsDec-25-2025, 19:52:30 GMT

Geoclidean: Few-Shot Generalization in Euclidean Geometry

Euclidean geometry is among the earliest forms of mathematical thinking. While the geometric primitives underlying its constructions, such as perfect lines and circles, do not often occur in the natural world, humans rarely struggle to perceive and reason with them. Will computer vision models trained on natural images show the same sensitivity to Euclidean geometry? Here we explore these questions by studying few-shot generalization in the universe of Euclidean geometry constructions. We introduce Geoclidean, a domain-specific language for Euclidean geometry, and use it to generate two datasets of geometric concept learning tasks for benchmarking generalization judgements of humans and machines. We find that humans are indeed sensitive to Euclidean geometry and generalize strongly from a few visual examples of a geometric concept. In contrast, low-level and high-level visual features from standard computer vision models pretrained on natural images do not support correct generalization. Thus Geoclidean represents a novel few-shot generalization benchmark for geometric concept learning, where the performance of humans and of AI models diverge. The Geoclidean framework and dataset are publicly available for download.

euclidean geometry, few-shot generalization, geoclidean, (6 more...)

Technology:

Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Machine Learning (1.00)

Neural Information Processing SystemsDec-25-2025, 12:45:58 GMT

Meteor: Mamba-based Traversal of Rationale for Large Language and Vision Models

The rapid development of large language and vision models (LLVMs) has been driven by advances in visual instruction tuning. Recently, open-source LLVMs have curated high-quality visual instruction tuning datasets and utilized additional vision encoders or multiple computer vision models in order to narrow the performance gap with powerful closed-source LLVMs. These advancements are attributed to multifaceted information required for diverse capabilities, including fundamental image understanding, real-world knowledge about common-sense and non-object concepts (e.g., charts, diagrams, symbols, signs, and math problems), and step-by-step procedures for solving complex questions. Drawing from the multifaceted information, we present a new efficient LLVM, Mamba-based traversal of rationales (Meteor), which leverages multifaceted rationale to enhance understanding and answering capabilities. To embed lengthy rationales containing abundant information, we employ the Mamba architecture, capable of processing sequential data with linear time complexity. We introduce a new concept of traversal of rationale that facilitates efficient embedding of rationale. Subsequently, the backbone multimodal language model (MLM) is trained to generate answers with the aid of rationale. Through these steps, Meteor achieves significant improvements in vision language performances across multiple evaluation benchmarks requiring diverse capabilities, without scaling up the model size or employing additional vision encoders and computer vision models.

artificial intelligence, mamba-based traversal, rationale, (10 more...)

Technology: Information Technology > Artificial Intelligence > Vision (1.00)

Neural Information Processing SystemsDec-23-2025, 19:21:05 GMT

Patch n' Pack: NaViT, a Vision Transformer for any Aspect Ratio and Resolution

The ubiquitous and demonstrably suboptimal choice of resizing images to a fixed resolution before processing them with computer vision models has not yet been successfully challenged.

aspect ratio and resolution, navit, vision transformer, (5 more...)

Technology: Information Technology > Artificial Intelligence > Vision (1.00)

Neural Information Processing SystemsNov-20-2025, 22:33:12 GMT

Adversarial Examples that Fool both Computer Vision and Time-Limited Humans

adversarial example, computer vision, computer vision and time-limited human, (4 more...)

Technology: Information Technology > Artificial Intelligence > Vision (1.00)

Gamaleldin Elsayed, Shreya Shankar, Brian Cheung, Nicolas Papernot, Alexey Kurakin, Ian Goodfellow, Jascha Sohl-Dickstein

Adversarial Examples that Fool both Computer Vision and Time-Limited Humans

Neural Information Processing SystemsNov-20-2025, 17:58:06 GMT

Work done as a member of the Google AI Residency program (g.co/airesidency).

adversarial example, artificial intelligence, machine learning, (14 more...)

Country:

North America > United States > Virginia (0.04)
North America > United States > Pennsylvania (0.04)
North America > United States > Maryland (0.04)
(6 more...)

Genre: Research Report > New Finding (1.00)

Industry:

Information Technology > Security & Privacy (0.94)
Health & Medicine > Therapeutic Area > Neurology (0.48)

Technology:

Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.95)

Kochnev, Roman, Goodarzi, Arash Torabi, Bentyn, Zofia Antonina, Ignatov, Dmitry, Timofte, Radu

Optuna vs Code Llama: Are LLMs a New Paradigm for Hyperparameter Tuning?

arXiv.org Artificial IntelligenceSep-30-2025

Optimal hyperparameter selection is critical for maximizing the performance of neural networks in computer vision, particularly as architectures become more complex. This work explores the use of large language models (LLMs) for hyperparameter optimization by fine-tuning a parameter-efficient version of Code Llama using LoRA. The resulting model produces accurate and computationally efficient hyperparameter recommendations across a wide range of vision architectures. Unlike traditional methods such as Optuna, which rely on resource-intensive trial-and-error procedures, our approach achieves competitive or superior Root Mean Square Error (RMSE) while substantially reducing computational overhead. Importantly, the models evaluated span image-centric tasks such as classification, detection, and segmentation, fundamental components in many image manipulation pipelines including enhancement, restoration, and style transfer . Our results demonstrate that LLM-based optimization not only rivals established Bayesian methods like Tree-structured Parzen Estimators (TPE), but also accelerates tuning for real-world applications requiring perceptual quality and low-latency processing.

large language model, machine learning, natural language, (19 more...)

2504.06006

Country: North America > United States > Minnesota (0.28)

Genre: Research Report > New Finding (1.00)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)
Information Technology > Artificial Intelligence > Cognitive Science (1.00)

Dubey, Alpana, Kuriakose, Suma Mani, Bhardwaj, Nitish

SynGen-Vision: Synthetic Data Generation for training industrial vision models

arXiv.org Artificial IntelligenceSep-8-2025

We propose an approach to generate synthetic data to train computer vision (CV) models for industrial wear and tear detection. Wear and tear detection is an important CV problem for predictive maintenance tasks in any industry. However, data curation for t raining such models is expensive and time - consuming due to the unavailability of datasets for different wear and tear scenarios. Our approach employs a vision language model along with a 3D simulation and rendering engine to generate synthetic data for var ying rust conditions. We evaluate our approach by training a CV model for rust detection using the generated dataset and tested the trained model on real images of rusted industrial objects. The model trained with the synthetic data generated by our approa ch, outperforms the other approaches with a mAP50 score of 0.87. The approach is customizable and can be easily extended to other industrial wear and tear detection scenarios.

artificial intelligence, machine learning, texture, (14 more...)

2509.04894

Country: Asia > India (0.30)

Genre: Research Report (0.64)

Technology:

Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.48)

Ahammed, Abu Shad, Hossain, Md Shahi Amran, Mukherjee, Sayeri, Obermaisser, Roman, Rahman, Md. Ziaur

Integration of Computer Vision with Adaptive Control for Autonomous Driving Using ADORE

arXiv.org Artificial IntelligenceSep-4-2025

Ensuring safety in autonomous driving requires a seamless integration of perception and decision making under uncertain conditions. Although computer vision (CV) models such as YOLO achieve high accuracy in detecting traffic signs and obstacles, their performance degrades in drift scenarios caused by weather variations or unseen objects. This work presents a simulated autonomous driving system that combines a context aware CV model with adaptive control using the ADORE framework. The CARLA simulator was integrated with ADORE via the ROS bridge, allowing real-time communication between perception, decision, and control modules. A simulated test case was designed in both clear and drift weather conditions to demonstrate the robust detection performance of the perception model while ADORE successfully adapted vehicle behavior to speed limits and obstacles with low response latency. The findings highlight the potential of coupling deep learning-based perception with rule-based adaptive decision making to improve automotive safety critical system.

adore, artificial intelligence, machine learning, (16 more...)

2508.17985

Country:

Europe > Germany (0.16)
Asia > Bangladesh (0.14)

Genre: Research Report (0.64)

Industry:

Transportation > Ground > Road (1.00)
Information Technology > Robotics & Automation (1.00)
Automobiles & Trucks (1.00)

Technology:

Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Robots > Autonomous Vehicles (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.55)

Sutton, Oliver J., Zhou, Qinghua, Leete, George, Gorban, Alexander N., Tyukin, Ivan Y.

Staining and locking computer vision models without retraining

arXiv.org Artificial IntelligenceJul-30-2025

We introduce new methods of staining and locking computer vision models, to protect their owners' intellectual property. Staining, also known as watermarking, embeds secret behaviour into a model which can later be used to identify it, while locking aims to make a model unusable unless a secret trigger is inserted into input images. Unlike existing methods, our algorithms can be used to stain and lock pre-trained models without requiring fine-tuning or retraining, and come with provable, computable guarantees bounding their worst-case false positive rates. The stain and lock are implemented by directly modifying a small number of the model's weights and have minimal impact on the (unlocked) model's performance. Locked models are unlocked by inserting a small `trigger patch' into the corner of the input image. We present experimental results showing the efficacy of our methods and demonstrating their practical performance on a variety of computer vision models.

artificial intelligence, deep learning, machine learning, (18 more...)

2507.22

Genre: Research Report > New Finding (0.48)

Industry: Information Technology > Security & Privacy (1.00)

Technology:

Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.94)
Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (0.91)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.68)